Minot
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
Wang, Ziliang, Zheng, Xuhui, An, Kang, Ouyang, Cijun, Cai, Jialu, Wang, Yuhang, Wu, Yichao
Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document retrieval, achieving notable improvements in QA performance, but underperform on complex, multi-hop QA resulting from the sparse rewards from global signal only. To address this gap in existing research, we introduce StepSearch, a framework for search LLMs that trained with step-wise proximal policy optimization method. It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties to better guide each search step. We constructed a fine-grained question-answering dataset containing sub-question-level search trajectories based on open source datasets through a set of data pipeline method. On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models over various search with RL baselines using only 19k training data, demonstrating the effectiveness of fine-grained, stepwise supervision in optimizing deep search LLMs. Our code will be released on https://github.com/Zillwang/StepSearch.
- Europe > Germany (0.14)
- Asia > Malaysia (0.14)
- Asia > Middle East > Kuwait (0.05)
- (13 more...)
David Icke Socioemotional "Thought Crimes" in American Schools: Tracking Student SEL Data for Precrime
'As a result of federal initiatives to "get tough on crime," such as the Reagan Administration's War on Drugs and the Clinton Administration's "Three Strikes" laws, the total number of incarcerated Americans more than quadrupled from roughly 500,000 inmates in 1980 to 2.2 million inmates in 2015. During these decades, black Americans were incarcerated at a rate five times higher than that of white Americans. Despite a new 2019 US Bureau of Justice Statistics (BJS) report, which suggests that the racial disparity between white and black incarceration rates is "narrowing," a Pew Research Center review of BJS stats reveals that this 2019 report "counts only inmates sentenced to more than a year."Moreover, Whites accounted for 64% of adults but 30% of prisoners. . . . In 2017, there were 1,549 black prisoners for every 100,000 black adults--nearly six times the imprisonment rate for whites (272 per 100,000)."
- North America > United States > California (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > North Dakota > Ward County > Minot (0.04)
- (6 more...)
- Instructional Material (0.93)
- Research Report (0.70)